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c/3 . Abstract 

cn ■ We prove a new extremal inequality, motivated by the vector Gaussian broadcast 

<^ , channel and the distributed source coding with a single quadratic distortion constraint 

*vj I problems. As a corollary, this inequality yields a generalization of the classical entropy- 

^^ ■ power inequality (EPI). As another corollary, this inequality sheds insight into maxi- 

^^ , mizing the differential entropy of the sum of two dependent random variables. 
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1 Introduction 



Like many other important results in information theory, the classical entropy-power inequal- 
ity (EPI) was discovered by Shannon [1] (even though the first rigorous proof was given by 
Stam [2] and was later simplified by Blachman [3]). In [H p. 641], Shannon used the EPI to 
prove a lower bound on the capacity of additive noise channels. While this first application 
was on a point-to-point scenario, the real value of the EPI showed up much later in the 
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multiterminal source/channel coding problems where the tension among users of different 
interests cannot be resolved by Fano's inequality alone. The most celebrated examples in- 
clude Bergman's solution [4] to the scalar Gaussian broadcast channel problem, Oohama's 
solution [5] to the scalar quadratic Gaussian CEO problem, and Ozarow's solution [6] to the 
scalar Gaussian two- description problem. 

Denote the set of real numbers by TZ. Let X, Z be two independent random vectors with 
densities in TZ^. The classical EPI states that 
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Here /i(X) denotes the differential entropy of X, and the equality holds if and only if X, Z 
are Gaussian and with proportional covariance matrices. 

Fix Z to be Gaussian with covariance matrix K^. Assume that K^ is strictly positive 
definite. Consider the optimization problem 



max {h{X) - fih(X + Z)} 

p(x) 



(2) 



where /i G 7^, and the maximization is over all random vector X independent of Z. The 
classical EPI can be used to show that for any /i > 1, a Gaussian X with a covariance matrix 
proportional to K^ is an optimal solution of this optimization problem. This can be done 
as follows. By the classical EPI, 
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For any fixed a &TZ and /i > 1, the function 
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is concave in t and has a global maxima at 



t = a--\og{^- 1). 
Hence the right-hand side of ([3]) can be further bounded from above as 
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The equality conditions of ([3]) and ([6]) imply that a Gaussian X with covariance matrix 
(yU — l)~^'Kz is an optimal solution of the optimization problem ([2]). 



Note that in solving the above optimization problem, the classical EPI not only forces 
the optimal solution to be Gaussian, but also imposes a certain covariance structure on 
the Gaussian optimal solution. Hence a natural question to ask is what happens if there 
is an extra covariance constraint such that the original Gaussian optimal solution is no 
longer admissible. In that case, the classical EPI can still be used; however, the equality 
condition may no longer be met by the new optimal Gaussian solution because it may no 
longer have the required proportionality. In particular, one would be interested in finding 
out whether under the extra covariance constraint, a Gaussian X is still an optimal solution 
to optimization problems such as ([2]). 

One particular type of covariance constraint is the following matrix covariance constraint: 

Cov(X) ^ S. (7) 

Here Cov(X) denotes the covariance matrix of X, "^" represents "less or equal to" in 
the positive semidefinite partial ordering of real symmetric matrices, and S is a positive 
semidefinite matrix. The reason for considering such a matrix covariance constraint is largely 
due to its generality: it subsumes many other covariance constraints including the important 
trace constraint. 



The focus of this paper is the following slightly more general optimization problem: 

(8) 



maxp(x) /i(X + Zi) — /x/i(X + Z2) 



subject to Cov(X) ^ S, 

where Zi, Z2 are Gaussian vectors with strictly positive definite covariance matrix K^Zi and 
K^2, respectively, and the maximization is over all random vector X independent of Zi and 
Z2. As we shall see, such an optimization problem appears naturally when one is to evaluate 
certain genie-aided outer bounds on the capacity/rate region for the vector Gaussian broad- 
cast channel and the distributed source coding with a single quadratic distortion constraint 
problems. Our main result is summarized in the following theorem. 

Theorem 1 For any /i > 1 and any positive semidefinite S, a Gaussian X is an optimal 
solution of the optimization problem ([8]). 

The rest of the paper is organized as follows. In Section [2], we prove our main result. 
We give two proofs: a direct proof using the classical EPI, and a strengthened proof fol- 
lowing the perturbation approach of Stam [2] and Blachman |3] . In Section [3], we discuss 
some ramifications of the main result. In Section |H we apply our main result to the vec- 
tor Gaussian broadcast channel and the distributed source coding with a single quadratic 
distortion constraint problems. For the former problem, our main result leads to an exact 
characterization of the capacity region. Finally, in Section we conclude by summarizing 
our contribution in the context of the applications of information theoretic inequalities in 
resolving multiterminal transmission/compression problems. 



2 Proofs of the Main Result 



2.1 A Direct Proof 

In this first proof, we sliow tliat tlie classical EPI can be appropriately used to give a direct 
proof to Theorem [H The fact that the classical EPI is relevant here is not surprising, 
considering that the objective function of the optimization problem ([8]) involves the entropy 
of the sum of two independent random vectors. Nonetheless, based on our discussion in 
Section [H a direct use of the classical EPI might be loose because the covariance matrix of 
the optimal Gaussian solution might not have the required proportionality. 

Our approach to resolve this issue is inspired by the mathematical import of an interesting 
technique, called enhancement, introduced by Weingarten et al. [7j. Our proof combines the 
idea of enhancement with the worst additive noise lemma [8], [9|, Lemma II. 2] stated as 
follows. 



Lemma 2 (Worst additive noise lemma) Let Z be a Gaussian vector with covariance 
matrix K^, and let Kx be a positive semidefinite matrix. Consider the following optimization 
problem: 

minp(x) /(Z; Z + X) , , 

subject to Cov(X) = Kx, ^^ 

where /(Z; Z + X) denotes the mutual information between Z and^ + Z, and the maximiza- 
tion is over all random vector^ independent ofZ. A Gaussian X is an optimal solution of 
this optimization problem (no matter Kx and K^ are proportional or not). 



The details of the direct proof are in Appendix S 



2.2 A Perturbation Proof 

From the optimization theoretic point of view, the power of the classical EPI lies in its 
ability to find global optima in nonconvex optimization problems such as ([2]). Hence one can 
imagine that proof of the classical EPI cannot be accomplished by any local optimization 
techniques. Indeed, in their classical proof Stam [2] and Blachman [3] used a perturbation 
approach, which amounts to find a monotone path from any distributions of the participating 
random vectors (i.e., X and Z in ([T])) to the optimal distributions (Gaussian distributions 
with proportional covariance matrices) for which the classical EPI holds with equality. The 
monotonicity guarantees that any distributions along the path satisfy the desired inequality. 



and hence the ones to begin with. A different perturbation was later used by Dembo et al. 
p!0| p. 1509]. The main idea, however, remains the same as that of Stam and Blachman's. 



Proving monotonicity needs isoperimetric inequahties. In case of the classical EPI, it 
needs the classical Fisher information inequality (FII) [101 Theorem 13]. Fisher information 
is an important quantity in statistical estimation theory. An interesting estimation theoretic 
proof using the data processing inequality for Fisher information was given by Zamir |llj . 
(The classical FII can also be proved by using the standard data processing inequality for 
mutual information, invoking a connection between Fisher information and mutual informa- 
tion explicitly established by Guo et al. [I2l Corollary 2].) This connection between the EPI 
and the FII is usually thought of as the estimation view of the classical EPI. 

We can use the perturbation idea to give a stronger proof to Theorem [H We construct a 
monotone path using the "covariance-preserving" transformation, which was previously used 
by Dembo et al. [lOl p. 1509] in their perturbation proof of the classical EPI. To prove the 
monotonicity, we need the following results on Fisher information matrix. 

Lemma 3 Denote by J(X) the Fisher information matrix of random vector^. 

1. (Cramer-Rao inequality) For any random vector U (of which the Fisher information 
matrix is well defined) with a strictly positive definite covariance matrix, 

J(U) h Cov-^(U). (10) 

2. (Matrix FII) For any independent random vectors U, V and any square matrix A, 

J(U + V) ^ AJ(U)A* + (I - A)J(V)(I - A)*. (11) 

Here I is the identity matrix. 

For completeness, a proof of the above lemma using the properties of score function is 
provided in Appendix [Bl The details of the perturbation proof are in Appendix O 



3 Ramifications of the Main Result 



In this section, we discuss two special cases of the optimization problem ([8]) to demonstrate 
the breadth of our main result. We term these two scenarios as the degraded case and the 
extremely-skewed case. By considering the degraded case, we prove a generalization of the 
classical EPI. By considering the extremely-skewed case, we establish a connection between 
our result and the classical result of Cover and Zhang [13] on the maximum differential 
entropy of the sum of two dependent random variables. 



3.1 The Degraded Case 

In the degraded case, we have either K^-^ ^ K^^ or K^j >z K^j- Fhst consider the case 
K^i :< K^j. We have the following results. 



Corollary 4 Let Zi, Z be two independent Gaussian vectors with covariance matrix K^^ 
and Kz, respectively. Assume that K^^ is strictly positive definite. Consider the following 
optimization problem: 

maxp(x) /i(X + Zi) - ^/i(X + Zi + Z) 

subject to Cov(X) ^ S, ^ ' 

where the maximization is over all random vector^ independent o/Zi and Z. For any ^ eTZ 
and any positive semidefinite S, a Gaussian X is an optimal solution of this optimization 
problem. 



Proof. For /i > 1, the corollary is a special case of Theorem [T] with Z2 = Zi + Z. For 
yU < 0, the corollary also holds because /i(X + Zi) and /i(X + Zi + Z) are simultaneously 
maximized when X is Gaussian with covariance matrix S. This left us the only case where 
/i G (0, 1), which we prove next. 

The objective function of optimization problem flT2l) can be written as 

(l-^)/i(X + Zi)-/i/(Z;X + Zi + Z). (13) 

Here /i(X + Zi) is maximized when X is Gaussian with covariance matrix S. By the worst 
noise result of LemmaO J(Z; X + Zi + Z) is minimized when X is Gaussian. Further within 
the Gaussian class, the one with the full covariance matrix S minimizes /(Z; Z + X + Zi). 
For /i G (0, 1), both /i and 1 — // are positive. We conclude that the objective function ( IT3l) 
is maximized when X is Gaussian with covariance matrix S. This completes the proof. D 



Corollary 5 Let 7a be a Gaussian vector with covariance matrix K^. Assume that K^ is 
strictly positive definite. Consider the following optimization problem 

maxp(x) /i(X) - ;u/i(X + Z) , . 

subject to Cov(X) ^ S, ^ ' 

where the maximization is over all random vector X independent of Z. For any /i G 7^ 
and any positive semidefinite S, a Gaussian X is an optimal solution of this optimization 
problem. 



Observe that the optimization problem (fT4|) is simply a constrained version of the opti- 
mization problem ([2]). Recall from Section [1] that the optimization problem ([2]) can be solved 
by the classical EPI. Conversely, it can be shown that the special case of the classical EPI 
with one of the participant random vectors (say, Z in ([T])) fixed to be Gaussian can also be 
obtained from the fact that a Gaussian X is an optimal solution of the optimization problem 
([2]). This can be done as follows. Choosing 



yU = i + exp 
we have from (151) that 



-(MZ)-MX)) 

n 



(15) 



hiX*a) = h{Z) - - log(/x - 1) = h{X). (16) 

Since X^ is an optimal solution of the optimization problem ([2]) (recall that X^ has a special 
covariance structure of being proportional to K^), we have 

/i(X) - ^/i(X + Z) < /i(Xy - /i/i(X^ + Z). (17) 

Substituting (fT6|) into flT7|) . we have /?,(X + Z) > /i(X^ + Z) for any random vector X 
independent of Z and satisfying h{X) = /i(X^). This is precisely the Costa-Cover form of 
the classical EPI [TOl Theorem 6], so we have proved the converse statement. 

In light of the above statements. Corollary \5\ can be thought of a generalization of the 
classical EPI. For technical reasons, we were not able to prove Corollary \5\ directly from 
Corollary H] by letting K^^ vanish. Instead, we can resort to arguments (direct and per- 
turbation ones) similar to those for Theorem [1] to prove Corollary [51 Observe that in the 
optimization problem flT4l) the lower constraint K^ ^ never bites, so no enhancement is 
needed in the perturbation proof. The details of the proof is omitted from the paper. 

We now turn to the other degraded case where K^^ >z K^j. Consider the optimization 
problem 

maxp(x) /i(X + Z2 + Z) - fihCX + Z2) . . 

subject to Cov(X) ^ S, ^ ' 

where the maximization is over all random vector X independent of Z2 and Z. For any /i > 1, 
by Theorem m a Gaussian X is an optimal solution of this optimization problem. For /i < 0, 
this is also true because /i(X + Z2 + Z) and /i(X + Z2) are simultaneously maximized when 
X is Gaussian with covariance matrix S. However, as we shall see next, this is generally not 
the case for /i G (0, 1). 



Consider the cases where 



-< -^Kz - Kz, -< S. (19) 

1 — /i 



(Note that this can only happen when /i G (0, 1) and also depends on the realizations of K^, 
K^2 ^iid S.) Under this assumption, we can verify that the covariance matrix K^^^ of X^ 
must satisfy: 

K*^ = ^Kz-Kz,. (20) 

1 — /i 

Let X be a non- Gaussian random vector satisfying: 

1. h(X + Z2) = h(X*a + Z2)] 

2. Cov(X) ^ S. 

Such an X exists because by the assumption, K.*x is strictly between and S. Since X is 
non-Gaussian, by the Costa-Cover form of the classical EPI, we have 

/i(X + Z2 + Z) >/i(X^ + Z2 + Z). (21) 

We thus conclude that at least for the cases where the condition (IT^ holds, the optimal 
Gaussian solution X^ cannot be an optimal solution of the optimization problem 0181) . 



3.2 The Extremely- Skewed Case 

Suppose that Zi, Z2 are in TZ^. Let 

Kz, = ViSiV*, Kz, = V2S2V*, (22) 

where Vi, V2 are orthogonal matrices and 

El = Diag(An, A12), S2 = Diag(A2i, A22) (23) 

are diagonal matrices. Consider the limiting situation where A12, A21 -^ 00, while An, A22 are 
kept fixed. Compared with the degraded case where K^^ dominates K^2 in every possible 
direction (or vice versa), this situation between K^^ and K^j is extremely skewed. We have 
the following result. 



Corollary 6 Let Z be a Gaussian random variable, and let vi, V2 be two deterministic 
vectors in Ti? . Gonsider the optimization problem 

maxp(x) /i(v*X + Z)- /i/i(v*X + Z) 

subject to Cov(X) ^ S, ^ ' 

where the maximization is over all random vector yi. (in 7i?) independent of Z . For any /i > 1 
and any positive semidefinite S, a Gaussian X is an optimal solution of this optimization 
problem. 

8 



Proof. See Appendix [Dl D 

Next, we use Corollary [6] to solve an optimization problem that involves maximizing the 
differential entropy of the sum of two dependent random variables. To put it in perspective, 
let us first consider the following simple optimization problem: 

maxp(a;i,a:2) h{Xi+X2) ^25) 

subject to Var(Xi) < ai, Var(X2) < 02, 

where ai, a2 > are real numbers, Var(X) denotes the variance of X, and the maximization 
is over all jointly distributed random variables (Xi,X2). The solution to this optimization 
problem is clear: h{Xi + X2) is maximized when Xi, X2 are jointly Gaussian with variance 
ai and 02, respectively, and are aligned, i.e., Xi = \la\la2X2 almost surely. 

Replacing both variance constraints in the optimization problem fl25l) by the entropy 
constraints, we have the following optimization problem: 

niaxp(2.j_a,2) /i(Xi+X2) , . 

subject to h{X}) < ai, h{X2) < 02, 

where ai,a2 € TZ, and the maximization is over all jointly distributed random variables 
(Xi,X2). Different from the optimization problem (l25l) . a jointly Gaussian (Xi,X2) is not 
always an optimal solution of (!26|) . This can seen as follows. Consider the case ai = 02. 
Let {X*q,X2q) be the optimal Gaussian solution of the optimization problem ( 1261) . We 
have X^Q = X2Q almost surely, i.e., X*q and X2Q are aligned and have the same marginal 
distribution. Consider all jointly distributed random variables (Xi,X2) for which Xi, X2 
have the same marginal density function / which satisfies: 

1. /.(Xi) = /i(X*g); 

2. / is not log-concave. 

The classical result of Cover and Zhang [13] asserts that among all (Xi,X2) satisfying the 
above conditions, there is at least one that satisfies 

h{X, + X2) > /i(2Xi) = h{2Xla) = h{Xl^ + X*g). (27) 

We thus conclude that a jointly Gaussian (Xi,X2) is not always an optimal solution of the 
optimization problem fl26l) . 

Between (125|) and (l26l) is the following optimization problem: 

maxp(^j^3.2) h{Xi+X2) ,^^. 

subject to Var(Xi) < ai, h{X2) < a2, 



where ai, 02 are real numbers with ai > 0, and the maximization is over all jointly distributed 
random variables (Xi, X2). The question whether a Gaussian (Xi, X2) is an optimal solution 
of this optimization problem remains, to our best knowledge, an open problem. The following 
result, however, can be proved using Corollary [6l 

Corollary 7 Let Z be a Gaussian variable, and let ai, a2 be real numbers with ai > 0. 
Consider the optimization problem 

maxp(2;,,2;2) h{Xi + X2 + Z) ,^^. 

subject to Var(Xi) < ai, h{X2 + Z) < a2, 

where the maximization is over all jointly distributed random variables (Xi,X2) independent 
of Z . A Gaussian (Xi,X2) is an optimal solution of this optimization problem for any ai > 
and any h{Z) < a2 < al^ where 

a* = ^ log f 27re ( Var(Z) + i (v^ai + 4Var(Z) - ^^j j j . (30) 

Proof. See Appendix [El D 



4 Applications in Multiterminal Information Theory 

4.1 The Vector Gaussian Broadcast Channel 

We now use our main result to give an exact characterization of the capacity region of the 
vector Gaussian broadcast channel. The capacity region of the vector Gaussian broadcast 
channel was first characterized by Weigarten et al. [7j. 

Consider the following two-user discrete-time vector Gaussian broadcast channel: 

Y4m]=X[m] + Zfc[m], A; = 1,2, (31) 

where {X[?7i]} is the channel input subject to an average matrix power constraint 

1 ^ 
-^X[m]X*[m]^S, (32) 

m=l 

and the noise {Zfc[m]} is i.i.d. Gaussian with zero mean and strictly positive definite covari- 
ance matrix K^^ and is independent of {X[?7i]}. The covariance structure of the Gaussian 

10 



noise models a scalar Gaussian broadcast channel with memory. Alternatively, it can also 
model the downlink of a cellular system with multiple antennas; this was the motivation of 

m- 

A vector Gaussian broadcast is in general a nondegraded broadcast channel. An exact 
characterization of the capacity region had been a long-standing open problem in multi- 
terminal information theory, particularly when viewed in the context of a scalar Gaussian 
broadcast channel with memory. Prior to [7], only bounds were known. An outer bound, 
derived by Marton and Korner [HI Theorem 5], is given hj O = Oi (1 O2, where Oi is the 
set of rate pairs {Ri,R2) satisfying 

Ri < /(X;Yi|f/) (33) 

R2 < /(f/;Y2) (34) 

for some p(yi,y2,x, u) = p(yi,y2|x)p(x, u) such that p(yi,y2|x) is the channel transition 
matrix and p(x) satisfies the constraint E[XX*] ^ S, and O2 is the set of rate pairs (i?i, R2) 
satisfying 

Ri < I{V;Yi) (35) 

R2 < /(X;Y2|V) (36) 

for some p(yi,y2,x, f) = J9(yi,y2|x)p(x, u) such that p(yi,y2|x) is the channel transition 
matrix and p(x) satisfies the constraint E[XX*] ^ S. 

Next, we derive a tight upper bound on the achievable weighted sum rate 

/iii?i -|-/i2-R2, (37) 

using the Marton-Korner outer bound as the starting point. Since a capacity region is always 
convex (per time-sharing argument), an exact characterization of all the achievable weighted 
sum rates for all nonnegative fii, fi2 provides an exact characterization of the entire capacity 
region. First consider the case /U2 > /ii > 0. By the Marton-Korner outer bound, any 
achievable rate pair (i?i,i?2) niust satisfy: 

I21R1 + 122R2 < /ii-max{/(X;Yi|f/) + /i/(f/;Y2)} (38) 

= /ii ■ max {-/i(Zi) + fihiX + Z2) + [h(X + Zi|f/) - fih(X + Zaif/)]} . (39) 

Here /i = — > 1, and the maximization is over all (t/, X) independent of (Zi,Z2) and 
satisfying the matrix constraint E[XX*] ^ S. Consider the terms h{Zi), h(X. + Z2) and 
/i(X + Zi|f/) - i^hCX + Zaif/) separately. We have 

MZi) = ^log((27re)"|K^J) (40) 
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and 



MX+Z2)<^log((27renS + Kz,|). 



(41) 



Further note that maximizing h(X. + Zi|f/) — fih(X. + Z2|f/) is simply a conditional version 
of the optimization problem ([S]). We have the following result, which is a conditional version 
of Theorem [H 



Theorem 8 Let Zi, Z2 be two Gaussian vectors with strictly positive definite covariance 
matrices K^^ and K^2? respectively. Let fi > 1 be a real number, S be a positive semidefinite 
matrix, and U be a random variable independent of Zi and Z2. Consider the optimization 
problem 

maxp(x|„) /i(X + Zi 1 1/) - /i/i(X + Z2 1 1/) , . 

subject to Cov(X|t/) ^ S, ^ ' 

where the maximization is over all conditional distribution o/X given U independent of Zi 
and Z2. A Gaussian p(x|u) with the same covariance matrix for each u is an optimal solution 
of this optimization problem. 



The result of the above theorem has two parts. The part that says a Gaussian p(x|m) is 
an optimal solution follows directly from Theorem [H the part that says the optimal Gaussian 
p{x.\u) has the same covariance matrix for each u is equivalent to that the optimal value of the 
optimization problem (jHj) is a concave function of S. Despite being a matrix problem, a direct 
proof of the concavity turns out to be difficult. Instead, Theorem [S] can be proved following 
the same footsteps as those for Theorem [T], except that we need to replace the classical EPI 
by a conditional version proved by Bergmans [U Lemma II]. Let Z be a Gaussian vector. 
Bergmans' conditional EPI states that 



exp 



n 



'-h(X + Z\U) 



> exp 



n 



'-h(X\U) 



exp 



n 



-MZ) 



(43) 



for any (X, U) independent of Z. The equality holds if and only if conditional on t/ = m, X 
is Gaussian with a covariance matrix proportional to that of Z and has the same covariance 
matrix for each u. The details of the proof are omitted from the paper. 



By Theorem [8], we have 

h{X+Zi\U)-fih{X+Z2\U) < max (^ log ((27re)"|Kx + K^J) - ^ log ((27re)"|Kx + K^J 



Substituting (gO]), (gT]) and (glD into ([39]), we obtain 



HiRi + /i2-R2 < max { — !- log 

OdKxdS I 2 



we obtain 






(44) 


Kx + Kz, 


+ ^2^ log 


S + Kz, 
Kx + Kz, 


} • (45) 
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Note that the weighted sum rates give by (l45l) can be achieved by dirty-paper coding [T6l[T7] . 
so (H5!) is an exact characterization of all the achievable weighted sum rates for /i2 > /Ui > 0. 

For /ii > /i2 > 0, we have from the Marton-Korner bound that 

fiiRi + ^2R2 < /X2 ■ max {fLl{V; Yi) + /(X; Yal^)} . (46) 

p(x,y) 

Here /i = — > 1, and the maximization is over all (V,X) independent of (Zi,Z2) and 
satisfying the matrix constraint E[XX*] ^ S. Relabeling V as U, the optimization problem 
becomes identical to that in (l38l). We thus conclude that 



/iii?i + /i2-R2 < max { — !- log 



S + Kz, 



Ky + K 



X -t- rs^zi 



/i2 

2 



log 



to 



Kx + Kz2 



K 



^2 



(47) 



is an exact characterization of all the achievable weighted sum rates for /ii > /i2 > 0. 
This settles the problem of characterizing the entire capacity region of the vector Gaussian 
broadcast channel. 



4.2 Distributed Source Coding with a Single Quadratic Distortion 
Constraint 



Our result is also relevant in the following distributed source coding problem. Let {Yi[??7,]}, 
{Y2[m]} be two i.i.d. vector Gaussian sources with strictly positive definite covariance 
matrix Ky-^ and Kyj, respectively. At each time m, Yi[m] and Y2[m] are jointly Gaussian. 
The encoder is only allowed to perform separate encoding on the sources. The decoder, 
on the other hand, can reconstruct the sources based on both encoded messages. We wish 
to characterize the entire rate region for which the quadratic distortion for reconstructing 
{Yi[m]} at the decoder 

- Y, (Yi[m] - Yi[m]) (Y,[m] - Yi[m]) ^ D. (48) 

TO=1 

(There is no distortion constraint on the source {Y2[m]}.) This is the so-called distributed 
source coding with a single quadratic distortion constraint problem. 

Note that Yi[?7i], Y2[?7i] are jointly Gaussian, so without loss of generality we can write 

Yi [m] = AY2 [m] + Z [m] , (49) 

where A is an invertible matrix and Z[m] is Gaussian and independent of Y2[m]. Since there 
is no distortion constraint on {Y2[7Ti]}, we can always assume that Yi[?72] is a degraded 
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version of Y2[7Ti] by relabeling AY2[?7i] as Y2[m]. In this case, an outer bound can be 
obtained similarly to that for the discrete memoryless degraded broadcast channel |19j : 

R, > /(Yi;Yi|f/) . . 

R2 > I{U-Y,) ^'^^ 

for some p(M,yi,yi,y2) = P{yi\u,yi)p{u\y2)p{yi,y2), where p(yi,y2) is the joint distribu- 
tion of the sources and p(yi|n,yi) satisfies the matrix constraint E[(Yi—Yi)(Yi—Yi)*] ^ D. 
The proof is deferred to Appendix [Fl Next, we derive a lower bound on all the achievable 
weighted sum rates /xi-Ri + /X2-R2 for all nonnegative fJ^i, fJ.2, using this outer bound as the 
starting point. 

By the outer bound fl50l) . all the achievable rate pairs (-Ri,i?2) must satisfy: 

/Zii?i+/X2^2 > /xi- min |/i/(Yi;Yi|f/)+/(f/;Y2)| (51) 

= /ii- min \h(Y2)-fih(Yi\Yi,U)-[h(Y2\U)-fih{Y2 + Z\U)]}.{52) 
p(",y|yi,y2) I- j 

Here /i = — > 0, and the minimization is over all p(M,y|yi,y2) such that U is independent 

of Z and E[(Yi - Yi)(Yi - Yi)*] ^ D is satisfied. Consider the terms /i(Y2), /i(Yi|Yi, U) 
and h{Y2\U) — /i/i(Y2 + Z|f/) separately. We have 

MY2) = ^log((2vrer|Ky,|) (53) 

and 

h{Y,\Y,,U) = h(Y,-Y,\Y,,U) < MYi - Yi) < ^log ((27re)"|D|) . (54) 

Hence we only need to maximize h{Y2\U) — fih{Y2 + Z\U) subject to the constraints 

Cov(Y2|f/) ^ Ky, and Cov(Yi|f/) h D. (55) 

In case that the constraint Cov(Yi|?7) >z D does not bite, we can use (a conditional version 
of) Corollary E] to show that a Gaussian test channel between Y2 and U is a maximizer: 

MY2|t/) -/iMY2 + Z|t/) < ^^max^^ |i log ((2vre)'^|K|) - I log ((27re)"|K + Ky, - KyJ)| . 

(56) 
Substituting ([53]), (El]) and (EED into ([52]), we have 



fiiRi + /i2-R2 > max { — - log 
0dK:<Ky„ I 2 



K 



Y2 



K 



/^2, 

2 log 


K + Ky, 


-Ky, 


D 





(57) 



On the other hand, this weighted sum rate can be achieved by the following natural Gaussian 
separation scheme: 
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1. Quantize {Yi[?7i]} and {Y2[rra]} separately using Gaussian codebooks; 

2. Use Slepian-Wolf coding [18] on the quantized version of {Yi[?Ti]}, treating the quan- 
tized version of {Y2[rra]} as decoder side information. 



This would have settled the rate region for the distributed source coding with a single 
quadratic constraint problem. 

Unfortunately, there are indeed instances where the constraint Cov(Yi|f/) y D cannot 
be ignored; in such cases, the outer bound studied here will be strictly inside the inner bound 
achieved by the natural Gaussian separation scheme. 



5 Concluding Remarks 



The classical EPI is an important inequality with interesting connections to statistical esti- 
mation theory. In information theory, it has been key to the proof of the converse coding 
theorem in several important scalar Gaussian multiterminal problems [11[5],[6]. In the vector 
situation, the equality condition of the classical EPI is stringent: the equality requires the 
participating random vectors not only be Gaussian but also have proportional covariance 
matrices. In several instances, this coupling between the Gaussianity and the proportional- 
ity is the main cause that prevents the classical EPI from being directly useful in extending 
the converse proof from the scalar case to the vector situation. 

In this paper, we proved a new extremal inequality involving entropies of random vectors. 
In one special case, this inequality can be seen as a robust version of the classical EPI. By 
"robust" , we refer to the fact that in the new extremal inequality, the optimality of a Gaussian 
distribution does not couple with a specific covariance structure, i.e. proportionahty. We 
show that the new extremal inequality is useful in evaluating certain genie-aided outer bounds 
for the capacity /rate region for the vector Gaussian broadcast channel and the distributed 
source coding with a single quadratic constraint problems. 

We offered two proofs to the new extremal inequality: one by appropriately using the 
classical EPI, and the other by the perturbation approach of Stam [2] and Blachman [5]. 
The perturbation approach gives more insights: it takes the problem (via the de Bruijn 
identity) to the Fisher information domain where the proportionality no longer seems a 
hurdle. Whereas the advantage of the perturbation approach is not crucial for the entropy 
inequalities discussed in this paper, it becomes crucial in some other situations [20] where 
the enhancement technique of Weingarten et al. does not suffice. 
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A A Direct Proof of Theorem [T] 

We now show that the classical EPI can be appropriately used to prove Theorem [H We first 
give the outline of the proof. 

Proof Outline. We first show that without loss of generality, we can assume that S is 
strictly positive definite. Next, we denote the optimization problem ([8]) by P and the optimal 
value of P by (P). To show that a Gaussian X is an optimal solution of P, it is sufficient 
to show that (P) = {Pg), where Pq is the Gaussian version of P by restricting the solution 
space within Gaussian distributions: 

maxK, ilog((27re)" \Kx + K^J) - f log ((27re)" \Kx + K^.l) . . 

subject to ^ Kx ^ S. ^ ^ 

Since restricting the solution space can only decrease the optimal value of a maximization 
problem, we readily have (P) > [Pg)- To prove the reverse inequality (P) < (Pg), we shall 
consider an auxiliary optimization problem P and its Gaussian version Pq- In particular, 
we shall construct a P such that: 

{P)<{P), {P) = {Pg), {Pg) = {Pg). (59) 

We will then have (P) < (Pg) and hence (P) = (Pg). 

The proof is rather long, so we divide it into several steps. 

Step 1: S ^ 0, |S| = 0. We show that for any S ^ but |S| = 0, there is an equivalent 
optimization problem of type ([8]) in which the the upper bound on X is strictly positive 
definite. 

Suppose that the rank of S is r < n, i.e., S is rank deficient. Let 

S = Q5S5Q5, (60) 

where Q5 is an orthogonal matrix, and 

S5 = Diag(Ai,---,A„0,---,0) (61) 

is a diagonal matrix. For any X ^ S, let X = ( X^, X^ j = Q^X where Xq is of a length r. 
We have 

Cov(X) = Q*sCov(X)Q5 ^ Q^^SQs = 5^5, (62) 

which implies that Cov(Xfc) = 0, i.e., X;, is deterministic. Without loss of generality, let us 
assume that Xf, = 0. So an optimization over Cov(X) ^ S is the same as an optimization 
over Cov(Xa) ^ Diag(Ai, ■ ■ ■ , A,-). 
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Next, let 



Q's^zMs = ( b' a ) ^^2) 



where Aj, Bj and Cj are submatrices of size r x r, {n — r) x r, and (n — r) x [n — r), 
respectively, and let 

D, ^ ( J -«f < ' ) . ,64) 

We have 

m-s^-{yf''){1')-{1'), (65) 

and 

(66) 
Hence if we let DQ^Zj = (Zj^,Zj^)* where Zj^ is of a length r, then Zj^^ and Zj^^ are 
statistically independent. It follows that 

/l(X + Z,) = /l(DQ*5X + DQ'^Zi) = h{Xa+%,a,%,b) = h{'Ka + %,a) + K%^,). (67) 

So maximizing /i(X + Zi) — /i/i(X + Z2) is equivalent to maximizing /i(Xa + Zi q) — /i/i(Xa + 
Z2,a) + ^(Zi,b) — /i/i(Z2,fe). Note that h{Zij,), i = 1, 2, are constants. Hence to show that (j8]) 
has a Gaussian optimal solution for a rank deficient S, it is sufficient to show that 

maxp(x,) h{'Xa_ + Zi,^) - h^Ka + T^i.a) ,^^^ 

subject to Cov(Xa) -< Diag(Ai, ■ ■ ■ , W 

has a Gaussian optimal solution. Since Diag(Ai, ■ ■ ■ , A,.) now has a full rank, we conclude 
that without loss of generality, we may assume that S in ([S]) is strictly positive definite. 

Stt'p 2: Construction of P. Let X^ be an optimal Gaussian solution of P, and let K.*x be 
the covariance matrix of X^. Then K.*^^ is an optimal solution to the optimization problem 
( 158|) . Although this conic program is generally nonconvex, it was shown in [TJ Lemma 5] 
that for S :^ 0, K^ must satisfy the following KKT-like conditions: 

^iK*x + KzJ-' + M^ = ^iK*x + Kz,r' + M2 (69) 

MiK;^ = (70) 

M2(s-k;;^) = 0, (71) 

where Mi, M2 ^ are Lagrange multipliers corresponding to Kx >z and Kx ^ S, respec- 
tively. Let K^ , K^ be two real symmetric matrices satisfying 



*x + Kz,r' + M, -- 


- ^-{K*x + K^y^ 


*x + Kz,)-' + M2 -- 


- |(K^ + K,~J-. 



/^/T^* I T^ \-i 1 T\/r f" /T^* 1 T^ \-i l'7'^'l 
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We have the following results on K^ and K^ proved in [71 Lemma 11,12]. 

Lemma 9 For K.*x, ^Zi, K^ , Mj, i = 1,2, related through ( I69l) to ( 1731) . anc? ft > 1, we 
have 

^ K^^ ^ Kz„ (74) 

K^^ ^ K^^ ^ K^,. (75) 

The matrices K^ , K^ are positive semidefinite, so they can serve as covariance matrices. 
Let Zi, Z2 be two Gaussian vectors with covariance matrix K^ and K^ , respectively. Let 
us define the auxiliary optimization problem P as: 



maxp(x) /i(X + Zi) -/i/i(X + Z2) + F 
subject to Cov(X) ^ S, 



(76) 



where the constant 

F := /i(Zi) - /i(Zi) + /i (/i(x(f) + Z2) - /i(x(f) + Z2)) , (77) 

(3) ~ 

Xg is a Gaussian vector with covariance matrix S and independent of Z2 and Z2, and the 
maximization is over all random vector X independent of Zi and Z2. 

In [3, p. 3937], the authors call the process of replacing Zi and Z2 with Zi and Z2, 
respectively, enhancement. Next, we show that the auxiliary optimization problem P defined 
in ( 1761) satisfies the desired chain of relationships ( l59l) . 

Ste'p 3: Proof of (P) < (P) . Note that P and P have the same solution space. So to 
show that (P) < (P), it is sufficient to show that for each admissible solution, the value of 
the objective function of P is less or equal to that of P. 

The difference between the objective functions of P and P can be written as 

h(X + Zi) - /i(Zi) - h{X + Zi) + /i(Zi) 

- /i (/i(X + Z2) - /i(X + Z2) - h(Ks + Z2) + h{-Ks + Z2)) . (78) 

By Lemma [9], K^. >z K^, for i = 1,2. So we can write Zj = Zj + Z,, where Zj is a Gaussian 
vector independent of Zj. We have 

/i(X + Zi) - /i(Zi) - /i(X + Zi) + /i(Zi) = /(X;X + Zi)-/(X;X + Zi) (79) 

= /(X;X + Zi + Zi)-/(X;X + Zi) (80) 
< 0, (81) 
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where the inequahty is due to the Markov chain 

X^X + Zi ^X + Zi + Zi. 



^2) 



Further, let X^ be a Gaussian random vector with the same covariance matrix as that of X. 
Assume that X^- is independent of Z2 and Z2. Note that both X^- and X^ are Gaussian 
and that 

Cov(Xg) = Cov(X) ^ S = Cov(xg^^). (83) 

(S) ^ ^ 

So we can write X^ = Xg + Xg, where Xg is a Gaussian random vector independent of 

Xg- We have 



/i(X + Z2) - /i(X + Z2) - /i(x[f ^ + Z2) + hCxi^^ + Z2) 

= /i(x + Z2 + Z2) - /i(x + Z2) - (/i(xg') + Z2 + Z2) - /i(xg^^ + Z2)) 

= /(Z2; X + Z2 + Z2) - /(Z2; Xg"^ + Z2 + Z2) 

> /(Z2; Xg + Z2 + Z2) - /(Z2; Xg") + Z2 + Z2) 

= J(Z2; Xg + Z2 + Z2) - /(Z2; Xg + Xg + Z2 + Z2) 

> 0, 



^4) 
^7) 



where inequahty (1861) follows from 

J(Z2; X + Z2 + Z2) > /(Z2; Xg + Z2 + Z2) 



^9) 



which is due to the worst noise result of Lemma [2], and inequality ( l88l) follows from the 
Markov chain 

Z2 ^ Xg + Z2 + Z2 ^ Xg + Xg + Z2 + Z2. (90) 

Substituting ( ISTl) and ( !88l) into ( j78|) . we conclude that the difference between the objective 
functions of P and P is nonpositive for any admissible X (i.e., Cov(X) ^ S) and any /i > 1. 

Step 4-' Proof of (P) = (Pa)- To show that (P) = (Pg), it is sufficient to show that X^, 
the optimal solution of Pq, is also an optimal solution of P. We consider the cases /i = 1 
and /i > 1 separately. 

First assume that /i > 1. By Lemma [HI K^ y K^ . So we can write Z2 = Zi + Z, where 
Z is Gaussian and independent of Zi. We have 



/i(X + Zi)-/i/i(X + Z2) 



= /i(X + Zi) - /i/i(X + Zi 4 
< /i(X + Zi)-^logfexp 

MZ)--log(/i-l);MZ; 



n 



-/i(X + Zi) 



exp 



n 



-MZ) 



(91) 
(92) 

(93) 
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where (p2|l follows from the classical EPI, and the function / in (!93|) was defined in (jl]). 
Next, we verify that the upper bound on the right-hand side of (!93|) is achieved by X^. 
Substituting (j72l) and (j73l) into the KKT-like condition (J69l) . we obtain 

(K;, + K^J-i = /i(K;, + K^J^\ (94) 

which gives 

K;, + K^^ = (/i-l)-iK^. (95) 

Hence, X^ + Zi and Z have proportional covariance matrices and inequality (192|) holds with 
equality. Further by (l95|) . 

/i(X^ + Zi) = /i(Z)--log(/i-l). (96) 

A comparison of ( l96l) and ([5]) confirms that h(X.Q + Zi) achieves the global maxima of 
function /(t; h{Z)), i.e., inequality fl93l) becomes equality with X^. We thus conclude that 
X^ is an optimal solution of P for all /i > 1. 

For /i = 1, we have from ( IMl) that K^ = K^ . So the objective function of P is constant, 
and X^ is trivially an optimal solution of P. 

Step 5: Proof of {Pq) = (Pg)- Note that X^ is an optimal solution of both Pq and Pq- 
So to show that (P^) = {Pq), we only need to compare the objective functions of Pq and Pq 
evaluated at X^. The following result, which is a minor generalization of [71 Lemma 11,12], 
shows that the objective functions of Pg and Pq take equal values at X^. 

Lemma 10 For K.*x, ^Zi, K^., Mj, i = 1,2, defined through ( 169|) to ( 1731) and fj, >1, we 
have 

(K;, + K^J-1K^^ = (K;, + K^J-1K^„ (97) 

(K;, + K^J-1(S + K^J = (K;, + K^J-1(S + K^J. (98) 

Combining Steps 1-5, we conclude that for any // > 1 and any positive semidefinite S, a 
Gaussian X is an optimal solution of ([H]). This completes the direct proof of Theorem [H 

A few comments on why we need the auxiliary optimization problem P are now in place. 
For the classical EPI to be tight, we need K^ + K^^ and K^^^ + K^j to be proportional 
to each other. However, by the KKT-like condition fl69|l . a guarantee of proportionality 
requires both multipliers Mi and M2 be zero. The purpose of enhancement is to absorb the 
(possibly) nonzero Lagrange multipliers Mi, M2 into the covariance matrices of Zi and Z2, 
creating a new optimization problem which can be solved directly by the classical EPI. The 
constant F is needed to make sure that {Pg) = (Pg)] the choice of F is motivated by the 
vector Gaussian broadcast channel problem. 
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B Proof of Lemma [3 



We first give some preliminaries on Fisher information and score function. This material can 
be found, for example, in [151 Chapter 3.2]. 



Definition 11 For a random vector U with a diff'erentiable density function fjj in TV^ , the 
Fisher information matrix J(-) is defined as 

J(U):=E[p^(U)p*^(U)], (99) 

where the vector-valued score function Pu{-) is defined as 

d , „ , . d "* 



p^(u):=Vlog/c;(u)= (— log/c;(u),---,^log/^(u)) . (100) 



The following results on score function are known. 

Lemma 12 The following statements on score function are true. 

1. (Gaussian Distribution) If U is a Gaussian vector with zero mean and positive definite 
covariance matrix K.u, then 

Puin) = -K^^u. (101) 

2. (Stein Identity) For any smooth scalar-valued function g well behaved at infinity, we 
have 

E[^(U)p^(U)] = -E[V^(U)]. (102) 

In particular, we have 

E[p^(U)] = and E[Up*^(U)] = -I, (103) 

where I is the identity matrix. 

3. (Behavior on Convolution) If U, V are two independent random vectors and W = 
U + V, then 

p^(w) = E[pj;(U)|W = w] = E[pv-(V)|W = w]. (104) 
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We now use the above properties of score function to prove Lemma [31 We first prove 
the Cramer- Rao inequahty. The Fisher information matrix J(U) has nothing to do with the 
mean of U, so without loss of generahty we can assume that U has zero mean. We have 

^ E[(p^(U) + K^iU)(pf,(U) + K^iU)*] (105) 

= E[p^(U)p^(U)*] + K^^E[Up*,(U)]+E[p^(U)U*]K^i + K^iE[UU*]K^i (106) 

= J(U)-K^i-K^i + K^i (107) 

= J(U)-K^\ (108) 

Here in (11071) we use the facts that 

E[p^(U)p^(U)*] = J(U) (109) 

by the definition of Fisher information matrix and that 

E[Up[;(U)]=E[p^(U)U*]=I (110) 

by the Stein identity. We conclude that J(U) >z. K^^ for any random vector U with a strictly 
positive definite covariance matrix K(/. 

The matrix FII can be proved similarly: 

^ E[(p^(W) - Ap,,(U) - (I - A)p,^(V))(p^(W) - Ap^(U) - (I - A)p^(V))*](lll) 
= E[p^(W)p*^(W)] + AE[p^(U)p*^(U)]A* + (I - A)E[pv^(V)pi.(V)](I - A)* 
-E[p^^(W)pMU)]A* - AE[p^(U)pi^(W)] 
-E[p^(W)pUV)](I - A)* - (I - A)E[p^(V)pt^(W)] 
+AE[p^(U)pUV)](I - A)* + (I - A)E[p^(V)p*^(U)]A*. (112) 

By the definition of Fisher information matrix, 

E[p,^(W)pt^(W)] = J(W), E[pf;(U)p*^(U)] = J(U), E[p,,(V)pUV)] = J(V). (113) 
By the convolution behavior of score function, 

E[p^(W)pMU)] = E[p^(W)E[p*^(U)|W]] = E[p^(W)pt^(W)] = J(W) (114) 

and similarly 

E[p^(U)pt^(W)] = J(W), E[p^(W)pt.(V)] = E[p,,(V)p|^(W)] = J(W). (115) 
Finally, since U, V are independent and by the Stein identity with / = 1, we have 

E[p^(U)pUV)] = E[p^(U)]E[p[;(V)] = (116) 
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and similarly 

E[Pv(V)p*^(U)] = 0. (117) 

Substituting (TTT3l)-(TTT7D into (ITT21) . we obtain 

^ J(W) + AJ(U)A*+(I- A)J(V)(I- A)*- J(W)A*- AJ(W)- J(W)(I- A)* 
-(I-A)J(W) (118) 

= -J(W) + AJ(U)A*+(I- A)J(V)(I-A)*, (119) 

which gives 

J(W) ^ AJ(U)A* + (I - A)J(V)(I - A)* (120) 

for any square matrix A. This completes the proof. 



C A Perturbation Proof of Theorem [T] 



We first give the outline of the proof. 

Proof Outline. Without loss of generality, let us assume that S :^ 0. To show that a 
Gaussian X is an optimal solution of P, it is sufficient to show that (P) = (Pg)- We have 
(P) > (Pg) (for free); we only need to show that (P) < (Pg)- For that purpose we shall 
consider the auxiliary optimization problem P: 

maxp(,) /i(X + Zi)-/i/i(X + Z2) + /i(Zi)-/i(Zi) .^^i) 

subject to Cov(X) ^ S, ^ ^ 

where the maximization is over all random vector X independent of Zi and Z2. Compared 
with the auxiliary optimization problem P in the direct proof, this enhancement is only on 
Zi. Following the same footsteps as those in the direct proof, we can show that (P) < (P) 
and (Pg) = (-Pg)- (In proving (P) < (P), only the equations fl79|) - fl8T]) and the Markov 
chain fl82|) in Appendix lAl are needed.) All we need to show now is that (P) = (Pg)- 

Proof of (P) = (Pg)- To show that (P) = (Pg), we shall show that X^ is a global 
optimal solution of P. For that we shall prove the following strong result: for any admissible 
random vector X there is a monotone increasing path connecting X and X^ (see Figure [1]). 

We consider the "covariance-preserving" transformation of Dembo et al. [10] : 

Xa = Vl - AX + v^X^, AG [0,1]. (122) 

Then {Xa} is a family of distributions indexed by A G [0, 1] and connecting X (when A = 0) 
with X^ (when A = 1). Let ^(A) be the objective function of P evaluated along the path 

^(A) := /i(Xa + Zi) - fih{Xx + Z2) + h{Zi) - /i(Zi). (123) 
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x^ 



Xa := ^/^^AX + ^/AX^ 

X : Cov(X) ^ S 
Figure 1: A monotone path connecting X and X^. 

Next, we calculate the derivative of ^ over A. 

Note that Zi is Gaussian and that a Gaussian distribution is stable under convolution. 
We can write 

Zi = Vl - AZi,i + v^Zi,2, (124) 

where Zi i, Zi^2 are independent and have the same distribution as that of Zi. We have 

/i(Xa + Zi) = /i(Vl - AX + v^X^ + Zi) (125) 

= /i(v/r^A(X + Zi,i) + yA(X^ + Zi,2)) (126) 

= /i(X + Zi,i + v/A(l - A)-i(X^ + Zi,2)) + (n/2) log(l - A). (127) 

By the (vector) de Bruijn identity fiUi Theorem 14], 

2(1-A)-^MX, + Zi) 

= (1 - Xy'Ti ((K;, + K^JJ (X + Zi,i + v/A(l - A)-i(Xc + Zi,2))) - n (128) 
= Tr ((K;, + K^JJ (v/r^A(X + Zi,i) + ^/A(X^ + Zi,2))) - n (129) 

= Tr ((K;;, + K^JJ (Xa + Zi)) - n. (130) 

Similarly, we have 

2(1 - A)-^/i(Xa + Z2) = Tr ((K^ + K^JJ (Xa + Z2)) - n. (131) 

dX 

Combining (I130p and (11311) . we have 
2(1 - A)^'(A) = Tr ((K^ + K^JJ(Xa + Zi) - fi{K*^ + Kz,)J(Xa + Z2)) +n(/i- 1). (132) 
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By the definition of K^ and tlie KKT-Iike condition (l69!l . we liave 

i(K;, + K^X' = f (Kx + KzJ-' + M,. (133) 

By tlie facts that /i > 1 and M2 >z 0, we obtain from (11331) that 

^(K;, + K^X' ^ ^(Kx + ^zX' (134) 

and hence that 

Kz. ^K^,. (135) 

We can now write Z2 = Zi + Z, where Z is Gaussian and independent of Zi. Applying the 
matrix FII of Lemma [3] with 

A = iK*^ + KzX\^x + ^zJ and I - A = (K;, + K^J'^K^, (136) 

we have 

J(Xa + Z2) = J(Xa + Zi + Z) ^ (137) 

< (K;, + KzX\^x + K^,)J(Xa + Zi)(K;, + K^X^*^ + KzJ^^ 

+ (K;, + KzJ-^K^J(Z)K^(K^ + KzJ-^ (138) 

= (K;, + KzX\^*x + %,)J(Xa + Zi)(K;, + K^J(K;, + KzJ-^ 

+ (K3, + KzX'^z{^*x + KzJ^\ (139) 

where the last equality follows from the fact that Z is Gaussian so 

K^J(Z) = I. (140) 

Substituting (11391) into (I132p and using the fact that K.-^ = K^j — K^^, we obtain 

2(1 - A)^'(A) 

> Tr((K;, + K^JJ(X, + Zi) - /.(K;, + K^JJ(X, + Z0(K;, + %J(K;, + KzX' 

- ^K^(K;, + KzX') + n{^^ - 1) (141) 

= 2Tr (((K;, + K^JJ(X, + Zi)(K^ + K^J - (K^, + K^j) M^) , (142) 

where the equality follows from (11331) . Further by the Cramer-Rao inequality of Lemma [3], 

(K3, + K^JJ(X, + Zi)(K3, + KjJ - (K3, + K^J 

h {K*^ + K^JCov-i(X, + Zi)(K;, + K^J - (Kl. + K^^) (143) 

= (K^ + K^J ((1 - A)Cov(X) + AK;, + K^X' (^x + K^J - (K^ + K^J (144) 

^ (K;, + K^J ((1 - A)S + AK^ + K^X' (Kx + K^J - (K^, + K^J (145) 

= -(1 - A)(K;, + K^J ((1 - A)S + AK;, + K^X' (S - ^x)- (146) 
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Substitute (IT46|) into (IT42D and recall from the KKT-like condition ([TTD that (S-K3^)M2 = 0. 
We have 

Tr (((K3, + K^JJ(X, + Zi)(K;, + K^J - (K3, + K^j) M^) 

> -(1 - A) Tr ((K;, + K^J ((1 - A)S + AK^, + K^X' (^ " ^x)^^) (147) 

= 0. (148) 

We conclude that 

^'(A)>0, VAe[0,l], (149) 

i.e., {Xa} is a monotone increasing path connecting X and X^. We have found a monotone 
increasing path for every admissible X, so X^ is an optimal solution of P. This completes 
the perturbation proof of Theorem [TJ 

A few comments on the difference between the direct proof and the perturbation proof of 
Theorem [1] are now in place. In the direct proof, we enhance both Zi and Z2 to obtain the 
proportionality so that the classical EPI can be applied to solve the auxiliary optimization 
problem P. For the perturbation proof, however, we only need to enhance Zi. (If the 
lower constraint Kx ^ does not bite, i.e.. Mi = 0, no enhancement is needed at all.) 
A direct perturbation is then used to show that X^ is an optimal solution of the auxiliary 
optimization problem P. Neither the classical EPI nor the worst noise result of Lemma [2] is 
needed in the perturbation proof. 



D Proof of Corollary M 



For any random vector X in TZ"^ such that Cov(X) < S and any /i > 1, we have from 
Theorem [1] that 

/i(X+Zi)-/i/i(X+Z2)< max |i log((27re)2|Kx + K^J) - ^ log((27re)2|Kx + K^J)) . 

^ Kxd S I Z Z J 

(150) 
Adding a constant term /i/i(Z2) — h{Zi) to both sides of fll50p . we obtain 

/(X;X + Zi)-/i/(X;X + Z2)<^^niax_^ |ilog|l + K^^iKx|-|log|l + KijKx||. 

(151) 
Let K^. = ViSjV*, where Vj = (vji, Vj2) is an orthonormal matrix and Sj = Diag(Aji, Aj2) is 
a diagonal matrix. Next, we consider taking the limits of both sides of fll5ip as A12, A21 -^ 00. 

First consider the limit of the left-hand side of (11511) . We need the following simple 
lemma. 
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Lemma 13 Let Z = (Zi, Z2)* where Z\, Z2 are two independent Gaussian variables with 
variance af and (jg, respectively. For any random vector^ = {Xi,X2y with finite variances 
and independent o/Z, we have 

lim /(X; X + Z) = /(Xi; Xi + Zi). (152) 



Proof. By the chain rule of mutual information, 

/(X; X + Z) = /(Xi; Xi + Zi) + /(Xi; X2 + Z2IX1 + Zi) + /(X2; X + Z|Xi). (153) 

Due to the Markov chains Xi + Zi ^ Xi ^ X2 + Z2 and Xi -^ X2 ^ X2 + Z2, we have 

/(Xi; X2 + Z2IX1 + Zi) < /(Xi; X2 + Z2) < /(X2; X2 + Z2). (154) 

Furthermore, we have 

/(X2;X + Z|Xi) = /(X2;X2 + Z2|Xi) + J(X2;Xi + Zi|Xi,X2 + Z2) (155) 

= /(X2;X2 + Z2|Xi) + /(X2;Zi|Xi,X2 + Z2) (156) 

= /(X2;X2 + Z2|Xi) (157) 

< /(X2;X2 + Z2), (158) 

where (11571) follows from the fact that Zi is independent of Z2 and X so /(X2; Zi|Xi,X2 + 
Z2) =0, and (11581) is due to the Markov chain Xi -^ X2 — ^ X2 + Z2. Note that 

lim J(X2; X2 + Z2) < lim ^ log f 1 + Y^li^\ = q (159) 

cTj— >co (Tj— >oo 2 y (72 J 

with finite Var(X2). We thus have from fITSD and flTSHj) that both /(Xi; X2 + Z2IX1 + Zi) 
and /(X2; X + Z|Xi) tend to zero in the limit as cr| -^ 00. The desired result (I152p follows 
by taking the limit cr| — i> 00 on both sides of ( ]153p . which completes the proof. D 

Let Zj = {Zii, ^1,2)* = V*Zj. Then, Zi^i and Zi2 are independent. By Lemma fT3l 

lim /(X;X + Zi)= lim J(V*X; V*X + V*Zi) = /(v'^X; v^^X + Zn) (160) 

Ai2^oo Ai2-^oo 

lim J(X;X + Z2)= lim J(V*X; V*X + V*Zi) = /(v*2X; v*2X + Z22), (161) 

X2i^oa A2i-^oo 

which gives 

lim /(X;X + Zi)-/i/(X + Z2) = /(v*iX;v*iX + Zn)-/i/(v*2X;v*2X + Z22). (162) 

Ai2,A2i— »oo 
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Next, we consider the limit of the right-hand side of fll5ip . For any semidefinite Kx, we 
have 



Al2 



hm |llog|l + K^^iKx|-^log|l + K^^iKx|) 

,A2i^oo I Z Z J 



= hm |^log|l + Sr'V*KxVi| -^log|l + S^^V*KxV2|j (163) 

Ai2,A2i^oo [ Z Z J 

= 1 log (1 + Ai-iVliKxVn) - ^ log (1 + X^,'^rl,Kx^r22) (164) 

due to the continuity of log |I + A| over the semidefinite A. Moreover, the convergence of 
(11641) is uniform in K^, because the continuity of log |I + A| over A is uniform and V*KxVj, 
i = 1,2, are bounded for :< K^ ^ S. we thus have 



" o^^^-<s{l^''^ (^ + A^/v*iKxVn) - I log (l + X^^W^^Kx^r22)\ ■ (165) 



Substituting (I162p and (I165P into (11501) . we obtain 

/(v*iX; v*,X + Zn) - yt//(v*2X; v'^X + Z22) 



< ^^max ^ |1 log (1 + Ar/v*iKxVn) - ^ log (l + A^s^^K^v^s) } (166) 



and hence 

/i(v*iX + Zn) - /i/i(v*2X + Z22) 



< max i-log(27re(v*iKxVii + Aii)) -^log(27re(v*2KxV22 + A22)) [ (167) 

Ol^KxdS I z z J 

for any random vector X such that Cov(X) ^ S and any /i > 1. This completes the proof. 

E Proof of Corollary [71 



Let vi = (1, 1)* and V2 = (0, 1)*. Consider {X : Var(Xi) < ai} = [Jg {X : Cov(X) ^ S} 
where the union is over all S such that (S)ii = ai. By Corollary [6l a Gaussian {Xi, X2 is an 
optimal solution to the optimization problem 

niaxp(^^,^2) h{Xi + X2 + Z) - fih{X2 + Z) ,_^^^. 

subject to Var(Xi) < ai, 
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where /x > 1, and the maximization is over all jointly distributed random variables (Xi, X2) 
independent of Z. Let {X^q, X2q) be the Gaussian optimal solution of the (11681) . Then 

/i(Xi +X2 + Z)- fih{X2 + Z)< h{Xlc + X*G + Z)- fihiX;^ + Z) (169) 

for any jointly distributed random variables (Xi,X2) such that Var(Xi) < ai. 

It is easy to verify that h{X2Q + Z) is a continuous function of fi. When /i = 00, 
h{X2Q + Z) = h{Z); when /i = 1, h{X2Q + Z) = 0*2 where 03 was defined in ( 1301) . By the 
intermediate value theorem, for any h{Z) < 02 < 03 there is a // for which h{X2Q + Z) = 02. 
Hence for any jointly distributed random variables (Xi,X2) such that Var(Xi) < ai and 
h{X2 + Z) < 02, we have by (IT69|) that 

h{x^ + X2 + z) < h{xic + x;c + z) + fi{h{X2 + z)-h{x;^ + z)) (i70) 

< h{X;a + X;a + Z). (171) 

We conclude that a Gaussian solution is an optimal solution of fl29l) for any ai > and any 
h{Z) < (22 < ^2. This completes the proof. 



F Proof of the Outer Bound ( CT ) 



Let Wi and W2 be the encoded messages for {Yi[?7i]} and {Y2[?7i]}, respectively. Let Y[" : = 

(Yi[l], ■ ■ ■ , Y,[m]) and U[m] := (W2, Y^^). We have 

NR2 = H{W2) (172) 

> H{W2) - H(W2\Y^) (173) 

= I{W2;Yl) (174) 

N 



= yj{W2;Y2[m]\Yr') (175) 

m=l 

N 

= J2 (h(Y2[m]\Yr') - h{Y2[m]\W2,Yr^)) (176) 

m=l 

= Y, (h{Y2[m]) - h{Y2[m]\W2, Yr\YT-' )) (177) 

m=l 

> y^ (/i(Y2[m]) - hiY2[m]\W2,YT-')) (178) 

m=l 

Af 

= 5^/(t/[m];Y2[m]), (179) 



m=l 
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where (11771) follows from the fact that Y™ ^ is a degraded version of Y™ ^ for m = 1, ■ ■ ■ , A^. 
Furthermore, 

NRi = H{Wi) 

> H{Wi)-H{Wi\W2,Yl) 
= I{W,;W2,Yl) 
= I{Wi-Y^\W2) + I{Wi-W2) 

> I{Wi;Yl\W2) 

N 



yj{W,;Y,[m]\W2,Yr') 

m=l 

N 

yj{WuY,[m]-Y,[m]\W2,Y'^-') 



m=l 

N 

> yjiY,[m];Y,[m]\W2,YT-': 



(180) 


(181) 


(182) 


(183) 


(184) 


(185) 


(186) 


(187) 



m=l 

N 



y I{Yi[m];Yi[m]\U[m]) 



m=l 



where (I186p follows from the Markov chain Yi[?n] —>■ {Wi, W2) -^ Yi[m,] for m = 1,- ■ ■ ,N. 
Finally, let Q be a random variable uniformly distributed over {1, ■ ■ ■ , A^} and independent 
of any other random variables/vectors. We have from (I179p and fll88p that 

Ri> I{U[Q];Y2[Q]\Q) = I{U[Q],Q;Y2[Q]) - I{Q;Y2m = I{U[Q],Q;Y2[Q]) = I{U;Y2) 

(189) 
and that 

R2> I{Y,[Q];Y,[Q]\U[Q],Q) = IiY,;Y,\U) (190) 

by defining 

U:={Q,U[Q]), Yi:=Yi[Q], Yi:=Yi[Q], Y2 := YsiQ]. (191) 

For each m = 1, ■ ■ ■ , A^, 

Yi[m] = Y2H + Z[m] -^ Y2[m] -^ U[m] = {W2,YJ^) (192) 

forms a Markov chain because Zi[m\ is independent of {W2,Y^~^). Therefore, 

Y^-^Y2^U (193) 

also forms a Markov chain. This completes the proof. 
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